An Exploratory Analysis of Covid Data from India¶

by Atonye Eben-Foby¶

Introduction¶

This notebook will document my efforts to exploratory analyze covid data from India. There are two data sets, The first dataset contains information about the recorded covid cases in India on a day to day basis.It contains 18110 observations with 9 variables on each observation, including date, State/UnionTerritory, ConfirmedIndianNational, ConfirmedIndianNational, and many others. The second dataset contains information about the covid vaccinations in India. It contains 7845 observations with 24 variables on each observation, including date, State/UnionTerritory, ConfirmedIndianNational, ConfirmedIndianNational, and many others.

Preliminary Wrangling¶

In [1]:
#import all packages needed for the analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
from datetime import datetime 
In [2]:
#load in your dataset with pandas
covid_df = pd.read_csv('covid_19_india.csv')
In [3]:
#get a first five rows of the data
covid_df.head()
Out[3]:
Sno Date Time State/UnionTerritory ConfirmedIndianNational ConfirmedForeignNational Cured Deaths Confirmed
0 1 2020-01-30 6:00 PM Kerala 1 0 0 0 1
1 2 2020-01-31 6:00 PM Kerala 1 0 0 0 1
2 3 2020-02-01 6:00 PM Kerala 2 0 0 0 2
3 4 2020-02-02 6:00 PM Kerala 3 0 0 0 3
4 5 2020-02-03 6:00 PM Kerala 3 0 0 0 3
In [4]:
#get the summary of the data
covid_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18110 entries, 0 to 18109
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Sno                       18110 non-null  int64 
 1   Date                      18110 non-null  object
 2   Time                      18110 non-null  object
 3   State/UnionTerritory      18110 non-null  object
 4   ConfirmedIndianNational   18110 non-null  object
 5   ConfirmedForeignNational  18110 non-null  object
 6   Cured                     18110 non-null  int64 
 7   Deaths                    18110 non-null  int64 
 8   Confirmed                 18110 non-null  int64 
dtypes: int64(4), object(5)
memory usage: 1.2+ MB
In [5]:
#get the descriptive statistics of numeric variables
covid_df.describe()
Out[5]:
Sno Cured Deaths Confirmed
count 18110.000000 1.811000e+04 18110.000000 1.811000e+04
mean 9055.500000 2.786375e+05 4052.402264 3.010314e+05
std 5228.051023 6.148909e+05 10919.076411 6.561489e+05
min 1.000000 0.000000e+00 0.000000 0.000000e+00
25% 4528.250000 3.360250e+03 32.000000 4.376750e+03
50% 9055.500000 3.336400e+04 588.000000 3.977350e+04
75% 13582.750000 2.788698e+05 3643.750000 3.001498e+05
max 18110.000000 6.159676e+06 134201.000000 6.363442e+06
In [6]:
#load in the vaccine dataset
vaccine_df = pd.read_csv('covid_vaccine_statewise.csv')
In [7]:
#get the first five rows of the dataset
vaccine_df.head()
Out[7]:
Updated On State Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) ... 18-44 Years (Doses Administered) 45-60 Years (Doses Administered) 60+ Years (Doses Administered) 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) Male(Individuals Vaccinated) Female(Individuals Vaccinated) Transgender(Individuals Vaccinated) Total Individuals Vaccinated
0 16/01/2021 India 48276.0 3455.0 2957.0 48276.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 23757.0 24517.0 2.0 48276.0
1 17/01/2021 India 58604.0 8532.0 4954.0 58604.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 27348.0 31252.0 4.0 58604.0
2 18/01/2021 India 99449.0 13611.0 6583.0 99449.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 41361.0 58083.0 5.0 99449.0
3 19/01/2021 India 195525.0 17855.0 7951.0 195525.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 81901.0 113613.0 11.0 195525.0
4 20/01/2021 India 251280.0 25472.0 10504.0 251280.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 98111.0 153145.0 24.0 251280.0

5 rows × 24 columns

In [8]:
#get the summary of the data
vaccine_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7845 entries, 0 to 7844
Data columns (total 24 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Updated On                           7845 non-null   object 
 1   State                                7845 non-null   object 
 2   Total Doses Administered             7621 non-null   float64
 3   Sessions                             7621 non-null   float64
 4    Sites                               7621 non-null   float64
 5   First Dose Administered              7621 non-null   float64
 6   Second Dose Administered             7621 non-null   float64
 7   Male (Doses Administered)            7461 non-null   float64
 8   Female (Doses Administered)          7461 non-null   float64
 9   Transgender (Doses Administered)     7461 non-null   float64
 10   Covaxin (Doses Administered)        7621 non-null   float64
 11  CoviShield (Doses Administered)      7621 non-null   float64
 12  Sputnik V (Doses Administered)       2995 non-null   float64
 13  AEFI                                 5438 non-null   float64
 14  18-44 Years (Doses Administered)     1702 non-null   float64
 15  45-60 Years (Doses Administered)     1702 non-null   float64
 16  60+ Years (Doses Administered)       1702 non-null   float64
 17  18-44 Years(Individuals Vaccinated)  3733 non-null   float64
 18  45-60 Years(Individuals Vaccinated)  3734 non-null   float64
 19  60+ Years(Individuals Vaccinated)    3734 non-null   float64
 20  Male(Individuals Vaccinated)         160 non-null    float64
 21  Female(Individuals Vaccinated)       160 non-null    float64
 22  Transgender(Individuals Vaccinated)  160 non-null    float64
 23  Total Individuals Vaccinated         5919 non-null   float64
dtypes: float64(22), object(2)
memory usage: 1.4+ MB
In [9]:
#no null values exist for the dataset
covid_df.isnull().sum()
Out[9]:
Sno                         0
Date                        0
Time                        0
State/UnionTerritory        0
ConfirmedIndianNational     0
ConfirmedForeignNational    0
Cured                       0
Deaths                      0
Confirmed                   0
dtype: int64
In [10]:
#drop columns that would not be used in the analysis
covid_df.drop(['Sno', 'Time', 'ConfirmedIndianNational','ConfirmedForeignNational'], inplace=True, axis=1)
In [11]:
#check that columns were dropped
covid_df.head()
Out[11]:
Date State/UnionTerritory Cured Deaths Confirmed
0 2020-01-30 Kerala 0 0 1
1 2020-01-31 Kerala 0 0 1
2 2020-02-01 Kerala 0 0 2
3 2020-02-02 Kerala 0 0 3
4 2020-02-03 Kerala 0 0 3
In [12]:
#change the date variable to date time format
covid_df['Date'] = pd.to_datetime(covid_df['Date'], format = '%Y-%m-%d')
In [13]:
#check that the format has been changed
covid_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18110 entries, 0 to 18109
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Date                  18110 non-null  datetime64[ns]
 1   State/UnionTerritory  18110 non-null  object        
 2   Cured                 18110 non-null  int64         
 3   Deaths                18110 non-null  int64         
 4   Confirmed             18110 non-null  int64         
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 707.5+ KB
In [14]:
#create a new column to show the number of active Cases 
covid_df['Active_Cases']= covid_df['Confirmed']-(covid_df['Cured']+covid_df['Deaths'])
In [15]:
#check that the column has been added
covid_df.head()
Out[15]:
Date State/UnionTerritory Cured Deaths Confirmed Active_Cases
0 2020-01-30 Kerala 0 0 1 1
1 2020-01-31 Kerala 0 0 1 1
2 2020-02-01 Kerala 0 0 2 2
3 2020-02-02 Kerala 0 0 3 3
4 2020-02-03 Kerala 0 0 3 3
In [16]:
#replaced wrongly spelt states with their correct names
covid_df['State/UnionTerritory'].replace('Maharashtra***', 'Maharashtra', inplace=True)
covid_df['State/UnionTerritory'].replace('Karanataka', 'Karnataka', inplace=True)
In [17]:
#create a pivot table to show the number of confirmed, deaths and cured cases in each state
statewise = pd.pivot_table(covid_df, values = ['Confirmed', 'Deaths', 'Cured'], index='State/UnionTerritory', aggfunc = max)
In [18]:
#create a column to calculate the recovery rate
statewise['Recovery_Rate'] = statewise['Cured']*100/statewise['Confirmed']
In [19]:
#create a column to create the mortality rate
statewise['Mortality_Rate'] = statewise['Deaths']*100/statewise['Confirmed']
In [20]:
#sort the dataset by a descending order of confirmed cases
statewise = statewise.sort_values(by = 'Confirmed', ascending = False)
In [21]:
#check that all changes have been reflected in the pivot table
statewise.head()
Out[21]:
Confirmed Cured Deaths Recovery_Rate Mortality_Rate
State/UnionTerritory
Maharashtra 6363442 6159676 134201 96.797865 2.108937
Kerala 3586693 3396184 18004 94.688450 0.501967
Karnataka 2921049 2861499 36848 97.961349 1.261465
Tamil Nadu 2579130 2524400 34367 97.877967 1.332504
Andhra Pradesh 1985182 1952736 13564 98.365591 0.683262
In [22]:
# change a column name to be more descriptiveb
vaccine_df.rename(columns = {'Updated On' : 'Vaccine_Date'}, inplace=True)
In [23]:
#check that the change has been effected
vaccine_df.head()
Out[23]:
Vaccine_Date State Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) ... 18-44 Years (Doses Administered) 45-60 Years (Doses Administered) 60+ Years (Doses Administered) 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) Male(Individuals Vaccinated) Female(Individuals Vaccinated) Transgender(Individuals Vaccinated) Total Individuals Vaccinated
0 16/01/2021 India 48276.0 3455.0 2957.0 48276.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 23757.0 24517.0 2.0 48276.0
1 17/01/2021 India 58604.0 8532.0 4954.0 58604.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 27348.0 31252.0 4.0 58604.0
2 18/01/2021 India 99449.0 13611.0 6583.0 99449.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 41361.0 58083.0 5.0 99449.0
3 19/01/2021 India 195525.0 17855.0 7951.0 195525.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 81901.0 113613.0 11.0 195525.0
4 20/01/2021 India 251280.0 25472.0 10504.0 251280.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 98111.0 153145.0 24.0 251280.0

5 rows × 24 columns

In [24]:
#drop column that won't be used for the analysis
vaccination = vaccine_df.drop(columns = ['Sputnik V (Doses Administered)', 'AEFI', '18-44 Years (Doses Administered)', '45-60 Years (Doses Administered)', '60+ Years (Doses Administered)'], axis=1)

I dropped the sputnik v and AEFI columns because only abot 0.015% of the vaccinated individuals used this vaccine. I dropped the dose administered columns because I am more concerned about the amount of individuals that were vaccinated not the amount of doses that they were each given

In [25]:
#check that the columns have been dropped
vaccination.head()
Out[25]:
Vaccine_Date State Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) Covaxin (Doses Administered) CoviShield (Doses Administered) 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) Male(Individuals Vaccinated) Female(Individuals Vaccinated) Transgender(Individuals Vaccinated) Total Individuals Vaccinated
0 16/01/2021 India 48276.0 3455.0 2957.0 48276.0 0.0 NaN NaN NaN 579.0 47697.0 NaN NaN NaN 23757.0 24517.0 2.0 48276.0
1 17/01/2021 India 58604.0 8532.0 4954.0 58604.0 0.0 NaN NaN NaN 635.0 57969.0 NaN NaN NaN 27348.0 31252.0 4.0 58604.0
2 18/01/2021 India 99449.0 13611.0 6583.0 99449.0 0.0 NaN NaN NaN 1299.0 98150.0 NaN NaN NaN 41361.0 58083.0 5.0 99449.0
3 19/01/2021 India 195525.0 17855.0 7951.0 195525.0 0.0 NaN NaN NaN 3017.0 192508.0 NaN NaN NaN 81901.0 113613.0 11.0 195525.0
4 20/01/2021 India 251280.0 25472.0 10504.0 251280.0 0.0 NaN NaN NaN 3946.0 247334.0 NaN NaN NaN 98111.0 153145.0 24.0 251280.0
In [26]:
#dropping rows where state is India
vaccination = vaccination[vaccination.State != 'India']
vaccination.head()
Out[26]:
Vaccine_Date State Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) Covaxin (Doses Administered) CoviShield (Doses Administered) 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) Male(Individuals Vaccinated) Female(Individuals Vaccinated) Transgender(Individuals Vaccinated) Total Individuals Vaccinated
212 16/01/2021 Andaman and Nicobar Islands 23.0 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 NaN NaN NaN NaN NaN NaN 23.0
213 17/01/2021 Andaman and Nicobar Islands 23.0 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 NaN NaN NaN NaN NaN NaN 23.0
214 18/01/2021 Andaman and Nicobar Islands 42.0 9.0 2.0 42.0 0.0 29.0 13.0 0.0 0.0 42.0 NaN NaN NaN NaN NaN NaN 42.0
215 19/01/2021 Andaman and Nicobar Islands 89.0 12.0 2.0 89.0 0.0 53.0 36.0 0.0 0.0 89.0 NaN NaN NaN NaN NaN NaN 89.0
216 20/01/2021 Andaman and Nicobar Islands 124.0 16.0 3.0 124.0 0.0 67.0 57.0 0.0 0.0 124.0 NaN NaN NaN NaN NaN NaN 124.0
In [27]:
#rename one of the variable names
vaccination.rename(columns ={'Total Individuals Vaccinated' : 'Total'}, inplace=True)
vaccination.head()
Out[27]:
Vaccine_Date State Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) Covaxin (Doses Administered) CoviShield (Doses Administered) 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) Male(Individuals Vaccinated) Female(Individuals Vaccinated) Transgender(Individuals Vaccinated) Total
212 16/01/2021 Andaman and Nicobar Islands 23.0 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 NaN NaN NaN NaN NaN NaN 23.0
213 17/01/2021 Andaman and Nicobar Islands 23.0 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 NaN NaN NaN NaN NaN NaN 23.0
214 18/01/2021 Andaman and Nicobar Islands 42.0 9.0 2.0 42.0 0.0 29.0 13.0 0.0 0.0 42.0 NaN NaN NaN NaN NaN NaN 42.0
215 19/01/2021 Andaman and Nicobar Islands 89.0 12.0 2.0 89.0 0.0 53.0 36.0 0.0 0.0 89.0 NaN NaN NaN NaN NaN NaN 89.0
216 20/01/2021 Andaman and Nicobar Islands 124.0 16.0 3.0 124.0 0.0 67.0 57.0 0.0 0.0 124.0 NaN NaN NaN NaN NaN NaN 124.0

Exploration and Visualization¶

In [28]:
#create a heatmap of the covid dataset
statewise.style.background_gradient(cmap = 'viridis_r')
Out[28]:
  Confirmed Cured Deaths Recovery_Rate Mortality_Rate
State/UnionTerritory          
Maharashtra 6363442 6159676 134201 96.797865 2.108937
Kerala 3586693 3396184 18004 94.688450 0.501967
Karnataka 2921049 2861499 36848 97.961349 1.261465
Tamil Nadu 2579130 2524400 34367 97.877967 1.332504
Andhra Pradesh 1985182 1952736 13564 98.365591 0.683262
Uttar Pradesh 1708812 1685492 22775 98.635309 1.332797
West Bengal 1534999 1506532 18252 98.145471 1.189056
Delhi 1436852 1411280 25068 98.220276 1.744647
Chhattisgarh 1003356 988189 13544 98.488373 1.349870
Odisha 988997 972710 6565 98.353180 0.663804
Rajasthan 953851 944700 8954 99.040626 0.938721
Gujarat 825085 814802 10077 98.753704 1.221329
Madhya Pradesh 791980 781330 10514 98.655269 1.327559
Madhya Pradesh*** 791656 780735 10506 98.620487 1.327092
Haryana 770114 759790 9652 98.659419 1.253321
Bihar 725279 715352 9646 98.631285 1.329971
Bihar**** 715730 701234 9452 97.974655 1.320610
Telangana 650353 638410 3831 98.163613 0.589065
Punjab 599573 582791 16322 97.201008 2.722271
Assam 576149 559684 5420 97.142232 0.940729
Telengana 443360 362160 2312 81.685312 0.521472
Jharkhand 347440 342102 5130 98.463620 1.476514
Uttarakhand 342462 334650 7368 97.718871 2.151480
Jammu and Kashmir 322771 317081 4392 98.237140 1.360717
Himachal Pradesh 208616 202761 3537 97.193408 1.695460
Himanchal Pradesh 204516 200040 3507 97.811418 1.714780
Goa 172085 167978 3164 97.613389 1.838626
Puducherry 121766 119115 1800 97.822873 1.478245
Manipur 105424 96776 1664 91.796934 1.578388
Tripura 80660 77811 773 96.467890 0.958344
Meghalaya 69769 64157 1185 91.956313 1.698462
Chandigarh 61992 61150 811 98.641760 1.308233
Arunachal Pradesh 50605 47821 248 94.498567 0.490070
Mizoram 46320 33722 171 72.802245 0.369171
Nagaland 28811 26852 585 93.200514 2.030474
Sikkim 28018 25095 356 89.567421 1.270612
Ladakh 20411 20130 207 98.623291 1.014159
Dadra and Nagar Haveli and Daman and Diu 10654 10646 4 99.924911 0.037545
Dadra and Nagar Haveli 10377 10261 4 98.882143 0.038547
Lakshadweep 10263 10165 51 99.045114 0.496931
Cases being reassigned to states 9265 0 0 0.000000 0.000000
Andaman and Nicobar Islands 7548 7412 129 98.198198 1.709062
Unassigned 77 0 0 0.000000 0.000000
Daman & Diu 2 0 0 0.000000 0.000000

Observation: State with the highest mortality rate is Punjap and state with the lowest recovery rate is Mizoram.¶

What are the top 10 states with most active cases in India?¶

In [29]:
#group the dataset by state and sort it by a descending number of active cases
top_10_active_states= covid_df.groupby(by ='State/UnionTerritory').max()[['Active_Cases', 'Date']].sort_values(by = ['Active_Cases'],ascending=False).reset_index()
In [30]:
#make a plot to show the top 10 staes with most active cases
fig = plt.figure(figsize=(16,9))
plt.title('Top 10 States with most active cases in India', fontsize=15)
ax = sns.barplot(data=top_10_active_states.iloc[:10], y='Active_Cases', x='State/UnionTerritory', linewidth=1, color =sns.color_palette()[0], edgecolor ='black')
plt.xlabel('States')
plt.ylabel('Total Active Cases')
Out[30]:
Text(0, 0.5, 'Total Active Cases')

Observation: State with the highest active case is Maharashtra with 700,000 active cases followed by Karnataka with 600,000 as the visual represents.¶

What are the top 10 states with the highes death in India?¶

In [31]:
#group the dataset and sort it by the descending order of number of death cases
top_10_deaths = covid_df.groupby(by ='State/UnionTerritory').max()[['Deaths', 'Date']].sort_values(by = ['Deaths'],ascending=False).reset_index()
In [32]:
#make a plot to show the top ten states with most deaths in India
fig = plt.figure(figsize=(16,9))
plt.title('Top 10 States with most deaths in India', fontsize=15)
ax = sns.barplot(data=top_10_deaths.iloc[:10], y='Deaths', x='State/UnionTerritory', linewidth=1, color =sns.color_palette()[0], edgecolor = 'black')
plt.xlabel('States')
plt.ylabel('Total Deaths')
Out[32]:
Text(0, 0.5, 'Total Deaths')

Observation: State with the highest death is Maharashtra with about 135,000 death cases followed by Karnataka with about 35,000 as the visual represents.¶

What is the growth trend in active cases for the top 5 states with highest death cases?¶

In [33]:
#make a plot to show the growth trend for the top 5 most affected states

fig = plt.figure(figsize=(12,6))

ax=sns.lineplot(data=covid_df[covid_df['State/UnionTerritory'].isin(['Maharashtra','Karnataka', 'Tamil Nadu', 'Delhi', 'Uttar Pradesh'])], x='Date', y='Active_Cases', hue = 'State/UnionTerritory')
ax.set_title('Top 5 Affected State in India', size=15)
Out[33]:
Text(0.5, 1.0, 'Top 5 Affected State in India')

Observation: There was an increase in number of death cases for all states except Delhi in September 2020, There was an extreme spike in the amount of active cases in May 2021 for all five states.¶

In [34]:
#make a pie chart to show the percentage of each gender that has been vaccinated

male = vaccine_df['Male(Individuals Vaccinated)'].sum()
female = vaccine_df['Female(Individuals Vaccinated)'].sum()
px.pie(names=['Male', 'Female'], values=[male,female], title='Male and Female Vaccination')

I did not represent the percentage of trangenders in the pie chart because only about 0.01% of the data set were trangender and it is negligible.

Observation: 53% of the total amount of individuals vaccinated in India were male and 47% were female.¶

In [35]:
#make a pie chart to show the percentage of data that falls into each vaccine category

covaxin = vaccination[' Covaxin (Doses Administered)'].sum()
covisheild = vaccination['CoviShield (Doses Administered)'].sum()
px.pie(names=['Covaxin', 'CoviSheild'], values=[covaxin,covisheild], title= 'Percentage of Indiviaduals that used Covaxin and CoviSheild' )

Observation: 88.6% of vaccinated Individuals used the CoviSheild vaccine and 11.4% used the Covaxin.¶

In [36]:
#make a pie chart to show the percentage of each gender that has been vaccinated

Age18_to_44 = vaccination['18-44 Years(Individuals Vaccinated)'].sum()
Age45_to_60= vaccination['45-60 Years(Individuals Vaccinated)'].sum()
Above_60= vaccination['60+ Years(Individuals Vaccinated)'].sum()

px.pie(names=['18-44', '45-60', '60+'], values=[Age18_to_44, Age45_to_60, Above_60], title='Distribution of Covid Vaccines among Age Groups', hole = .5)

Observation: 42% of the vaccinated individuals were 45-60 years old, 37.9% were above 60 and only 20.1% were 18-24.¶

In [37]:
#group the vaccine dataset by state and sort in descending order the total number of vaccinations for each state
max_vac = vaccination.groupby('State')['Total'].sum().to_frame('Total')
max_vac = max_vac.sort_values('Total', ascending = False)[:5]
max_vac
Out[37]:
Total
State
Maharashtra 1.403075e+09
Uttar Pradesh 1.200575e+09
Rajasthan 1.141163e+09
Gujarat 1.078261e+09
West Bengal 9.250227e+08
In [38]:
#make a plot to show the top five vaccinated states
fig = plt.figure(figsize=(10,5))
plt.title('Top 5 Vaccinated States in India', size = 15)
x = sns.barplot(data=max_vac, y = max_vac['Total'], x=max_vac.index, color=sns.color_palette()[0], edgecolor = 'black')

Observation: Mahashtra was the most vaccinated state followed by Uttar Pradesh as the visual represents¶

In [39]:
#group the vaccine dataset by state and sort in descending order the total number of vaccinations for each state
min_vac = vaccination.groupby('State')['Total'].sum().to_frame('Total')
min_vac = min_vac.sort_values('Total', ascending = True)[:5]
min_vac
Out[39]:
Total
State
Lakshadweep 2124715.0
Andaman and Nicobar Islands 8102125.0
Ladakh 9466289.0
Dadra and Nagar Haveli and Daman and Diu 11358600.0
Sikkim 16136752.0
In [40]:
#make the plot to show the least five vaccinated states in the dataset
fig = plt.figure(figsize=(13,8))
plt.title('Least 5 Vaccinated States in India', size = 15)
x = sns.barplot(data=min_vac, y = min_vac['Total'], x=min_vac.index, color=sns.color_palette()[0], edgecolor = 'black')

Obsevation: Lakshadweep was the least vaccinated state, followed by Andaman and Nicobar Islands as the visual represents¶

In [41]:
statewise.to_csv('C:/Users/hp/Desktop/statewise.csv')

Summary of Findings:¶

  • State with the highest mortality rate is Punjap and state with the lowest recovery rate is Mizoram.
  • Top 10 states with most active cases iare Maharashtra, KarnatakaK,erala, Tamil Nadu, Uttar Pradesh, Rajasthan, Andhra Pradesh, Gujurat, West Bengal, Chhattisgarh in that order.
  • Top 10 states with the highest death is Maharashtra, Karnataka, Tamil Nadu, Delhi, Uttar Pradesh, West Bengal, Kerela, Punjab, Andhra Pradesh, Chhattisgarh in that order.
  • There was an increase in number of death cases for all states except Delhi in September 2020, There was an extreme spike in the amount of active cases in May 2021 for all five states.
  • 53% of the total amount of individuals vaccinated in India were male and 47% were female.
  • 88.6% of vaccinated Individuals used the CoviSheild vaccine and 11.4% used the Covaxin.
  • 42% of the vaccinated individuals were 45-60 years old, 37.9% were above 60 and only 20.1% were 18-24.
  • The top 5 most vaccinated states are Mahashtra, Uttar Pradesh, Rajasthan, Gujurat and West Bengal in the that order.
  • The least vaccinated states were Lakshadweep, Andaman and Nicobar Islands, Ladakh State, Dadra and Nagar Haveli and Daman Diu State and Sikkim in that order
In [ ]: